Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: add update_time column to source state table #13437

Closed
wants to merge 2 commits into from

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented Nov 15, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

To improve observability and facilitate the measurement of source freshness
related: https://github.com/risingwavelabs/risingwave-docs/issues/1513

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Copy link
Contributor

@lmatz lmatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, cc: @cyliu0

so for each input row, there will be a unique timestamp generated for it, right?

@StrikeW
Copy link
Contributor Author

StrikeW commented Nov 15, 2023

LGTM, thanks, cc: @cyliu0

so for each input row, there will be a unique timestamp generated for it, right?

Precisely, the row can be unchanged as before since we employ the heartbeat event of debezium, so if upstream database hasn't been updated for a long time, the offset in the heartbeat will remain the same.

@lmatz
Copy link
Contributor

lmatz commented Nov 15, 2023

Just leave a comment for visibility

I think it is okay to use update_time as the metric that we track in our internal test pipelines.

i.e. during PG -> CDC -> RW, after all of the data has been updated to PG and all of its CDC has been synced to RW, we get the last commit_ts from PG and the last update_time from RW, and we compare the difference between them to show whether RW was lagging behind PG.

But one problem is that it seems the user cannot use this trick to monitor a pipeline where the PG still gets updated and CDCed as there is no notion of all.

Moreover, if offset is a more intuitive concept for users to monitor if RW is lagging, I feel we'd better use the same metric, e.g. offset, in our internal testing pipeline.

potential solution: commit_ts synced to RW and minus by update_time seems a more intuitive approach.

@@ -117,9 +117,16 @@ impl Source {
sub_fields: vec![],
type_name: "".to_string(),
};
let update_time = Field {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this affect the schema of the already created source/table? Is this change backward-compatible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, I suggest we add the update time field in the json.

Copy link
Contributor Author

@StrikeW StrikeW Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pr won't merge, yesterday we found that debezium can capture the event time on upstream, so we decide to generate a (process_time - event_time) metric to promethues as an indication of lagging. FYI
#13440

@StrikeW StrikeW changed the title feat: add update_time column to source state table feat: (won't merge) add update_time column to source state table Nov 16, 2023
@StrikeW StrikeW changed the title feat: (won't merge) add update_time column to source state table Discussion: add update_time column to source state table Nov 16, 2023
@StrikeW StrikeW closed this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants